03 | Shallow Neural Nets

Max Pellert (https://mpellert.at)

Deep Learning for the Social Sciences

Shallow neural networks

1D regression model is obviously limited

We want to be able to describe input/output relationships that are not lines

We want multiple inputs

We want multiple outputs

Shallow neural networks

Flexible enough to describe arbitrarily complex input/output mappings

Can have as many inputs as we want

Can have as many outputs as we want

1D Linear Regression

Example shallow network

Example shallow network

Example shallow network

This model has 10 parameters:

  • Represents a family of functions

  • Parameters determine particular function

  • Given parameters can perform inference (run equation)

  • Given training dataset:

  • Define loss function (least squares)

  • Change parameters to minimize loss function

Three example shallow networks

Piecewise linear functions with three joints

Break down into two parts:

where:

Building up a shallow network

  1. Compute the three linear functions

  1. Pass through ReLU functions (creates hidden units)

  1. Weight the hidden units

  1. Sum the weighted hidden units to create output

Other examples of shallow networks

Example shallow network = piecewise linear functions

1 “joint” per ReLU function

Activation patterns

Which hidden units are activated in the shaded region?

Shaded region:

Unit 1 active

Unit 2 inactive

Unit 3 active

Depicting Neural Nets

Each parameter multiplies its source and adds to its target

Universal Approximation Theoreom

With three hidden units, like before:

With D hidden units:

With enough hidden units…

…we can describe any 1D function to arbitrary accuracy

Universal approximation theorem

“A formal proof that, with enough hidden units, a shallow neural network can describe any continuous function on a compact subset of ℝᴰ to arbitrary precision”

More than 1 output

1 input, 4 hidden units, 2 outputs

More than 1 input

2 inputs, 3 hidden units, 1 output

Arbitrary inputs, hidden units, outputs

Outputs, D hidden units, and inputs

e.g. three inputs, three hidden units, two outputs

Some nomenclature…

More nomenclature

  • Y-offsets = biases

  • Slopes = weights

  • Everything in one layer connected to everything in the next = fully connected network

  • No loops = feedforward network

  • Values after ReLU (activation functions) = activations

  • Values before ReLU = pre-activations

  • One hidden layer = shallow neural network

  • More than one hidden layer = deep neural network

  • Number of hidden units = network capacity

🧰 🔩 🛠

Other activation functions

Question for next time: What happens if we feed one neural network into another neural network?